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Abstract We propose a novel approach for distributed 
statistical detection of change-points in high- volume net- 
work traffic. We consider more specifically the task of 
detecting and identifying the targets of Distributed De- 
nial of Service (DDoS) attacks. The proposed algorithm, 
called DTopRank, performs distributed network anomaly 
detection by aggregating the partial information gath- 
ered in a set of network monitors. In order to address 
massive data while limiting the communication over- 
head within the network, the approach combines record 
filtering at the monitor level and a nonparametric rank 
test for doubly censored time series at the central de- 
cision site. The performance of the DTopRank algo- 
rithm is illustrated both on synthetic data as well as 
from a traffic trace provided by a major Internet ser- 
vice provider. 

Keyvifords Distributed detection • change-point 
detection • rank test • censored data • network anomaly 
detection. 



1 Introduction 

Detecting malevolent behaviors has become a prevalent 
concern for the security of network infrastructures, as 
exemplified by the, now common, attacks against major 
web services providers. In this contribution, we consider 
more specifically the case of DDoS (Distributed Denial 
of Service) type of attacks where many different sources 
transmit data over the network to a few targets so as 
to flood resources and, eventually, cause disruptions in 
service. 
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Several methods for dealing with DDoS attacks have 
been proposed. They can be arranged into two cate- 
gories: signature-based approaches and statistical meth- 
ods. The former operate by comparing the observed 
patterns of network traffic with known attack templates. 
Obviously, this methodology only applies for detect- 
ing anomalies that have already been encountered and 
characterized. The second type of approaches relies on 
the statistical analysis of network patterns and can thus 
potentially detect any type of network anomalies. The 
basic statistical modelling for this task is to assume that 
network anomalies lead to abrupt changes in some net- 
work characteristics. Hence, most statistical methods 
for detection of network anomalies are cast in the frame- 
work of statistical change-point detection, which is a fa- 
miliar topic in statistics, see, e.g., Basscvillc and Niki- 
forov (1993); Brodsky and Darkhovsky (1993); Csorgo 
and Horvath (1997), and references therein. 

Two different approaches to change-point detection 
are usually distinguished: the detection can be retro- 
spective and hence with a fixed delay (batch approach) 
or online, with a minimal average delay (sequential ap- 
proach). In the field of network security, a widely used 
change-point detection technique is the cumulated sum 
(CUSUM) algorithm described in Basse^'iIle and Niki- 
forov (1993) which is a sequential approach. It has, 
for instance, been used by Wang et al (2002) and by 
Siris and Papagalou (2006) for detecting DoS attacks of 
the TCP (Transmission Control Protocol) SYN flooding 
type. This attack consists in exploiting the TCP three- 
way hand-shake mechanism and its limitation in main- 
taining half-open connections. More precisely, when a 
server receives a SYN packet, it returns a SYN/ACK 
packet to the client. Until the SYN/ACK packet is ac- 
knowledged by the client, the connection remains half- 
opened for a period of at most the TCP connection 
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timeout. A backlog queue is built up in the system 
memory of the server to maintain all half-open connec- 
tions, this leading to a saturation of the server. In Siris 
and Papagalou (2006), the authors use the CUSUM 
algorithm to detect a change-point in the time series 
corresponding to the aggregation of the SYN packets 
received by all the requested destination IP addresses. 
With such an approach, it is only possible to set off an 
alarm when a massive change occurs in the aggregated 
series; it is moreover impossible to identify the attacked 
IP addresses. 

Given the nature of a TCP/SYN flooding attack, 
the attacked IP addresses may be identified by apply- 
ing multiple change-point detection tests, considering 
each of the time series formed by counting the num- 
ber of TCP/SYN packets received by individual IP ad- 
dresses. This idea is used in Tartakovsky et al (2006) 
where a multichannel detection procedure, which is a 
refined version of the previously described algorithm, is 
proposed: it makes it possible to detect changes which 
occur in a channel and which could be obscured by the 
normal traffic in the other channels if global statistics 
were used. 

When analyzing wide-area-network traffic, however, 
it is not possible anymore to consider individually all 
the possible target addresses for computational reasons. 
For instance, the data used for the evaluation of the 
proposed method (see Section 3) contains several thou- 
sands of distinct IP addresses in each one-minute time 
slot. In order to detect anomalies in such massive data 
within a reasonable time span, it is impossible to an- 
alyze the time series of all the IP addresses receiving 
TCP/SYN packets. That is why dimension reduction 
techniques have to be used. Three main approaches 
have been proposed. The first one uses Principal Com- 
ponent Analysis (PCA) techniques, see Lakhina et al 
(2004). The second one uses random aggregation (or 
sketches), see Krishnamurtliy et al (2(J03) and the third 
one is based on record filtering, see Levy-L(xluc and 
Rouefi^ (2009). Localization of the anomalies is possible 
with the second and third approaches but not with the 
first one. By localization, we mean finding the attacked 
IP addresses. 

In the approaches mentioned above, all the data is 
sent to a central analysis site, called the collector in the 
sequel, in which a decision is made concerning the pres- 
ence of an anomaly. These methods are called central- 
ized approaches. A limitation of these methods is that 
they are not adapted to large networks with massive 
data since, in this case, the communication overhead 
within the network becomes significant. The approach 
that we propose in this paper consists in processing the 
data within the network (in local monitors) in order to 



send to the collector only the most relevant data. These 
methods are called, in the sequel, decentralized or dis- 
tributed approaches. In Huang et al (2007), a method 
to decentralize the approach of Lakhina et al (2004) 
is considered but, as previously explained, with such a 
method localizing the network anomaly is impossible. 

The main contribution of this paper is an efficient 
way of decentralizing the TopRank algorithm introduced 
in Lcvy-Leduc and Roueff (2009). The proposed algo- 
rithm, termed D TopRank (for Distributed TopRank), 
uses the TopRank algorithm locally in each monitor 
and only sends the most relevant data to the collec- 
tor. The data sent by the different local monitors is then 
aggregated in a specific way that necessitates the devel- 
opment of a novel nonparametric rank test for doubly 
censored data that generalizes the proposal of Gomloay 
and Liu (2000). The DTopRank algorithms makes it 
possible to achieve a performance that is on a par with 
the fully centralized TopRank algorithm while minimiz- 
ing the data that needs to be send from the monitors 
to the collector. 

The paper is organized as follows. In Section 2, we 
describe the DTopRank method and determine the limit 
in distribution of the proposed test statistic under the 
null hypothesis that there is no network anomaly. The 
performance of the proposed algorithm (implemented 
in C language) is then assessed both using a real traf- 
fic trace provided by a major Internet Service Provider 
(Section 3) as well as on synthetic data (Section 4). In 
both cases, DTopRank is compared both to the cen- 
tralized TopRank algorithm and to a simpler baseline 
decentralized algorithm based on the use of the Bonfer- 
roni correction. 



2 Description of the methods 

The raw data that is analyzed consists of flow-level sum- 
maries of the communications on the network. These 
include, for each data flow, the source and destination 
IP addresses, the start and end time of the communi- 
cation as well as the number of exchanged packets. All 
of this information is contained in the standard Netfiow 
format. 

Depending on the type of anomaly to be detected, 
one needs to consider specific aspects of the network 
traffic. In the case of the TCP/SYN fiooding, the quan- 
tity of interest is the number of TCP/SYN packets re- 
ceived by each destination IP address per unit of time. 
We denote by (A^i(0)r>i ^^'^ discrete time series formed 
by counting the number of TCP/SYN packets received 
by the destination IP address / in the f-th sub-interval 
of size A seconds, where A is the sampling period. 
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The centralized TopRank algorithm analyzes these 
global packets counts. In our case however, we con- 
sider a monitoring system with a set of local monitors 
Ml, . . . ,Mk, which collect and analyze the locally ob- 
served time series. As as consequence of decentralized 
processing, the packets sent to a given destination IP 
address are not observed at all monitors, although some 
overlap may exist, depending on the routing matrix and 
the location of the monitors. We thus denote by N^{t) 
the number of TCP/SYN packets transiting to the des- 
tination IP address / in the sub- interval indexed by f , as 
observed by the ^-th monitor. In the proposed batch ap- 
proach, detection is performed from the data observed 
during an observation window of duration P x A sec- 
onds. The goal is to detect change-points in the aggre- 
gated time series (M(f))r>i using only the local time 
series (A^f (0)r>i for each k £ {1, . . . ,K} and a quantity 
of data transmitted to the collector that is as small as 
possible. 

2.1 The DTopRank method 

The DTopRank algorithm operates at two distinct lev- 
els: the local processing step within the local monitors 
Ml,... ,Mk and the aggregation and global change-point 
detection step within the collector. 

2.1.1 Local processing 

The local processing of DTopRank consists of the four 
steps described below, which are applied in each of the 
K monitors. The first three steps arc similar to the 
TopRank algorithm applied to the local series of counts 
{N^{t)) i^f^p- The second and third steps are however 
modified by introducing a lower censoring value for each 
analyzed series so as to make possible global aggrega- 
tion at the collector level. In this section, the superscript 
k, corresponding to the monitor index, is dropped to al- 
leviate the notations. 

1. Record filtering: For each time index t G {1, . . . 
the indices of the M largest counts Ni{t) are recorded 
and labeled as /i (f ) , . . . , iM{t) in such a way that: Ni^ (,) (f ) > 
Ni2{t) (0 ^ ■ ■ ■ — ^iM{t) (0- III the sequel, 5m(0 denotes the 
set {/1(f): • • • j'm(0}- We stress that, in order to perform 
the following steps, wc only need to store the variables 
{Ni{t),iG^M{t),t = l,...,P}. 

2. Creation of censored time series: For each index / 
selected in the previous step (/ G UfLi ■'^Mit))-, the cen- 
sored time series is built. This time series is censored 
since i does not necessarily belong to the set for 



all indices t in the observation window, in which case, 
its value A^, (f ) is not available and is censored using the 
upper bound A^,j^(,)(r) =min,g^^(,)A?, (f). More formally, 
the censored time series {Xi{t),5i{t))i<t<p are defined, 
for each fG {!,...,-?}, by 

r (A^,-(0,1), liie^Mit) 
{Xi{t),di{t)) = < ( i^in Nj{t),0), otherwise. 

The value of 5,(f) indicates whether the corresponding 
value Xi{t) has been censored or not. Observe that, by 
definition, 5,(f) = 1 implies that X,(f) =Ni{t) and 5,(f) = 
implies that Xi{t) >Ni{t). We also define the upper 
and lower bounds of Xi{t) by X,(f) = X,(f ) and X;(f ) = 
Xi{t)8i{t), respectively. 

In order to process a fixed number S of time series 
instead of all those in UrLi S^uit) (at most M x P), we 
only build the time scries corresponding to the index ; 
in the list ii{l), . . . ,ii{P),i2{l), . . . ,i2iP), hi'^),--- where 
the indices 4(f) are defined in the previous step. 

3. Change-point detection test: In Lcvy-Leduc and Roueff 
(2009), the nonparametric test proposed by Gombay 
and Liu (2000) is used for detecting change-points in 
censored data. Here, this test is extended in order to 
detect change-points in doubly censored time series so 
that the same procedure can be applied both in the local 
monitors and within the collector. This test, described 
hereafter, is applied to each time series created in the 
previous stage and the corresponding p-value is com- 
puted, a small value suggesting a potential anomaly. 

Let us now further describe the statistical test that 
wc perform. This procedure aims at testing from the ob- 
servations previously built {X_j(t),Xi{t))i<i<p if a change 
occurred in this time series for a given More precisely, 
if we drop the subscript / for convenience in the descrip- 
tion of the test, the tested hypotheses are: 

{Hq): "{X_{t),X{t))i<t<p are independent and identi- 
cally distributed. " 

{Hi ) : "There exists some r such that {{X{l),X{l)), . . . , 
mr)J{r))) and {{X{r+l),X{r+ 1)), ■ ■ ■ , {X{P),X{P))) 
have a different distribution. " 

To define the proposed test statistic, define, for each 
s,t in {l,...,P}, 

h{s,t) = l{X{s) >X{t))-l{X{s) <X{t)) , 

where 1{E) = 1 in the event E and in its complemen- 
tary set, and 

U ^ 
Ys^ , ' with Us = yh{s,t). (1) 
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The test statistic is then given by 

r 

Wp^ max I Vy,| . 

i<t<P 

The following theorem, which is proved in appendix, 
provides, under mild assumptions, the limiting distribu- 
tion of W/>, as P tends to infinity, under the null hypoth- 
esis and thus provides a way of computing the values 
of the test. 

Theorem 1 Let be a S?-valued random vector 

such that 

^F{X-) + G{X)^l) <l , (2) 

where F is the c.d.f. of X, G the c.d.f. of X_ and F{x^) 
denotes the left limit ofF at point x. Let (X.(f ),X(f))i<;</> 
be i.i.d. random vectors having the same distribution as 
(X_,X), then, as P tends to infinity, 

sup |£y,|^B*:= sup \B{u)\, (3) 

0<M<1 s=l 0<»<1 

where {B{u) ,0 < u < 1} denotes the Brownian Bridge 
and — ^ refers to convergence in distribution. 

Remark 1 A simple sufficient condition for ensuring (2) 
is that the probability of not being censored {X_ = X) is 
positive, and that, on this event, the observation may 
take at least two different values with positive proba- 
bility. This formally means that there exists m in M such 
that 

< P{X =X<u)<l . 

Theorem 1 of this paper thus extends Theorem 1 in 
Gonibay and Liu (2000) where only one-sided censoring 
is considered and continuity of the random variables is 
assumed. 

Remark 2 In practice, the computation of the quanti- 
ties (Ls=iii)i<r</' can be done in (9(f) operations only 
using the alternate form of t/j in term of the empirical 
cumulative distribution functions of X(f) and X_(t) (see 
Eq. (7) in appendix). 

Based on (3), we take for the change-point detec- 
tion test the following p- value: fva/(Wp), where for all 
positive b (see, for instance, Billingsley, 1968, p. 85), 

Pval{h) = P(B* > /7) = 2 £ (-l)>-ie-2/fo2 _ 

Jf.. Selection of the data to be transmitted to the collector: 
We select in each monitor the d censored time series 
having the smallest p-values and send them to the col- 
lector. Thus, the collector receives dx K censored time 
series, instead of Lf^jZJjt, where D/^ is the number of 
destination IP addresses seen by the kth monitor if a 
centralized approach was used. 



2.L2 Aggregation and change-point detection test in 
the collector 

Within the collector, the lower and upper bounds of 
the aggregated time series (Z,(f ),Z,(f ))i<,</> associated 
to the IP address / are then built as follows: 

m = t (0 and Z,(0 = f rf ' (f ) , (4) 

k=\ k=\ 

where {x}^ {t),t = \,...,P) and (xf\t),t= l,...,P) are 
the time series associated to the IP address i created in 
the monitor M<.. Then, we apply the test described in 
step 3 of the local processing to the time series (Z,(r),f = 
1,...,P) and (Z,(f),f = l,...,f). An IP address i is thus 
claimed to be attacked at a given false alarm rate a S 
(0,1), if Pval{Wp) < a, and the change-point time is 
estimated with r — argmaxi<;</) \St\. 

2.2 The BTopRank method 

In the sequel, the DTopRank algorithm is compared 
with a simpler approach using, instead of the aggrega- 
tion step, a simple Bonferroni correction of the values 
determined in each monitor. More precisely, in BTo- 
pRank an IP address is claimed to be attacked at the 
level a G (0,1) within the collector if at least one lo- 
cal monitor has computed a value smaller than (x/K, 
namely if /ir(infi<|(.</f Pvalj^) < a, Pval^ being the p- value 
computed in the monitor k. 

3 Application to real data 

This section summarizes the results obtained by the 
DTopRank and BTopRank algorithms applied to an ac- 
tual Internet traffic trace provided by a major Internet 
service provider. 

3.1 Description of the data 

We consider the data used in Section 4 of Lcvy-Leduc 
and Roucff (2009), which corresponds to a recording of 
118 minutes of ADSL (Asymmetric Digital Subscriber 
Line) and Peer-to-Peer (P2P) traffic to which some 
TCP/SYN flooding type attacks have been added. As 
this data set does not contain full routing informa- 
tion, it has been artificially distributed over a set of 
virtual monitors as follows: the data is shared among 
K = \5 monitors by assigning each source destination 
pair (source IP address, destination IP address) to a 
randomly chosen monitor; a single monitor thus records 
all the flows between two particular IP addresses. The 
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(a) - Global traffic 



(b) - Traffic in monitor 1 



(c) - Traffic in monitor 2 
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(d) - Attacks 



(e) - Attacks in monitor 1 



(f) - Attacks in monitor 2 



Fig. 1 Number of TCP/SYN packets globally exchanged (top) and received by the 4 attacked IP addresses (bottom) in the 
original data (a,d) and within two particular monitors (b, c, e, f). Note that the scale of the bottom figures is divided by 20 
with respect to the top ones. 



experiments reported below are based on 50 indepen- 
dent replication of this process. Finally, the existing 
anomalies have been down-sampled (by randomly drop- 
ping packets involved in the attacks) to 12.5 and 25 
packets/s, respectively, to explore more difficult detec- 
tion scenarios. 

Figure l-(a) displays the total number of TCP/SYN 
packets received during each second by the different re- 
quested IP addresses. The number of TCP/SYN pack- 
ets received by the four attacked destination IP ad- 
dresses are displayed in (d) (12.5 packets/s case). As 
we can see from this figure, the first attack occurs at 
around 2000 seconds, the second at around 4000 sec- 
onds, the third at around 6000 seconds and the last 
one at around 6500 seconds. These attacks produce 33 
ground-truth anomalies - abrupt increase or decrease 
of the signal. Figures l-(b), (c) display the number of 
TCP/SYN packets globally exchanged within two dif- 
ferent monitors whereas (e), (f) focus on the traffic re- 
ceived by the attacked IP addresses within these two 
monitors. 

The attacked IP addresses (bottom part of Figure 1) 
are completely hidden in the global TCP/SYN traffic 
(top part of Figure 1) and thus very difficult to de- 
tect. Note also that 1006000 destination IP addresses 
are present in this data set, with an average of 15000 
destination IP addresses in each of the 118 one- minute 
observation windows. Hence, real time processing of the 
data would not be possible, even at the monitor level. 



without a dimension reduction step such as record fil- 
tering. 



3.2 Performance of the methods 

In what follows, the DTopRank algorithm is used with 
the same parameters as those adopted in Lcvy-Leduc 
and Roueff (2009) for the TopRank algorithm, with one- 
minute windows divided in P = 60 subintervals of zi = 
1 s, with M = 10 and 5 = 60. 

Figure 2 and 3 show the benefits of the aggregation 
stage within the collector of the DTopRank algorithm 
with respect to the use of the simple Bonferroni cor- 
rection in the BTopRank algorithm. Figures 2-(a),(b) 
and (c) display the time series {X_{t),t ~ 1,...,P) and 
(X(f),f = 1, . . . ,f ) associated to an attacked IP address 
in three different monitors as well as the corresponding 
/7-values. Figure 2-(d) displays the aggregated time se- 
ries (Z(f),f = 1, . . . ,f) and (Z(f),f = 1, . . . ,f), as defined 
in (4), as well as the associated value. Note that the 
aggregated time series corresponds to the aggregation 
of 1 1 time series created by 1 1 different monitors where 
the attacked IP address has been detected. The p-value 
of the aggregated time series is much smaller than the 
ones determined at the local monitors, which enables 
the detection of an attack which would be difficult to 
detect within the local monitors. 



6 



(a) 



(b) 



3 — 
2 - 
1 - 




5.72e-04 % 

















(c) 



(d) 



5.65e-06 











10 20 30 40 50 60 



Fig. 2 (a), (b), (c): times series (xf (f),f = 1, . . . ,60) and 

{xf \t),t = I, ... ,60) displayed with ('x') and ('o') respec- 
tively, for 3 different values of k, (d): {Zi{t).t = 1,...,60) and 
(Zi(f),f = 1,...,60) displayed with ('x') and ('o') respectively. 



Figure 3 displays on the x and y-axes the quanti- 
ties PvaloTop and PvaleTop, respectively. For a given 
IP address, PvaloTop corresponds to the p-value com- 
puted with the DTopRank and Pvalsonf is obtained 
by applying the Bonferroni correction to the p-values 
transmitted by the monitors. The DTopRank provides 
smaller /^-values than the Bonferroni approach for IP 
addresses that were really attacked and /^-values of the 
same order as infi<it</f Pval^ for the other IP addresses. 
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Fig. 3 (PvaloTop-PvalBTop) displayed with ('.') except for 
the ground-truth attacked IP addresses which are displayed 
with ('•'). 



In Figure 4 the influence on the DTopRank algo- 
rithm of the number d of transmitted series is investi- 
gated. Figure 4 displays ROC curves for the DTopRank 



algorithm with the average rate of false alarm on the 
jc-axis and the average rate of right detection on the y- 
axis for different values ofc/ (t/=l,5,10) computed from 
50 Monte-Carlo replications. Each replication consists 
in randomly assigning a pair (source IP, destination IP) 
to a monitor. In Figure 4, the extra information brought 
by larger values of d most noticeably contributes to an 
increased number of false detections. This behavior is 
partly due to the presence of at most one anomaly per 
observation window in our data set. Larger values of d 
may actually be preferable in cases where the average 
number attacks to be detected is higher. In the follow- 
ing, we fix the value of d to d — I. 
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Fig. 4 Influence of the parameter d on the performance of 
the DToprank algorithm. 



The DTopRank and BTopRank algorithms are fur- 
ther compared in Figure 5 which displays the ROC 
curves obtained using these two methods with 50 Monte- 
Carlo replications in two different cases. The bottom 
plot deals with attacks having an intensity of 12.5 SYN/s. 
In the other situation, the attacks are the same except 
that their intensity is 25 SYN/s. For comparison pur- 
pose, the ROC curve associated to the non distributed 
TopRank algorithm is also displayed in both situations. 
Figure 5 shows that for the 25 SYN/s-attacks, the three 
methods give similar results. However, in the most dif- 
ficult case of the 12.5 SYN/s-attacks, the DTopRank 
algorithm outperforms the BTopRank algorithm. 

Thus, DTopRank performs very similarly to the cen- 
tralized algorithm, especially in the range of interest 
where the false alarm rate is about le-4 (recall that 
there are about 15000 different IP addresses in each 
one minute window). The quantity of data exchanged 
within the network is however much reduced as the cen- 
tralized algorithm needs to obtain information about, 
on average, 34000 flows per minute whereas the DTo- 
pRank algorithm only need to transmit the d upper 
and lower censored time series from the monitors to 
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the collector. For d — I and K = 15, this amounts to 
1800 scalars that need to be transmitted the collector, 
versus 34000 x 5 (start and end time stamps, source and 
destination IP, number of SYN packets for each flow) 
for the centralized algorithm, resulting in a reduction of 
almost two orders of magnitude of the data that needs 
to be transmitted over the network. 
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Fig. 5 ROC curves for the DTopRank, BTopRank and 
TopRank algorithms for attacks having intensities of 25 
SYN/s (top) and 12.5 SYN/s (bottom). 



single anomaly, as measured by 15 monitors randomly 
positioned on a plausible network topology. 

4.1 Description of the data 

A network topology is generated in which synthesized 
traffic between hosts located in the nodes of that net- 
work is injected. We first generate an Erdos-Renyi ran- 
dom graph (Erdos and Renyi (1959)) with 15 nodes and 
a probabihty of edge creation of 0.15. The generated 
graph is displayed in Figure 6. It is similar in terms of 
number of nodes or nodes degrees to the Abilene net- 
work, which has been widely considered in the context 
of network anomaly detection, see Lakliina ot al (2004) 
and Huang ct al (2007). This graph has been generated 
once and is used for all replications of the Monte-Carlo 
simulations that will follow. For each Monte-Carlo repli- 
cation, a node of the graph is randomly assigned to each 
of the D = 1000 IP addresses and K = 15 monitors are 
also randomly positioned on 15 of the 24 edges of the 
graph, see Figure 6. 




4 Application to synthetic data 

In this section, we provide results obtained on simu- 
lated data with two specific goals in mind. First, the 
the traffic trace used in Section 3 contains generated 
attacks but is not fully labeled. Hence, it could be the 
case that non-labeled anomalies are already present in 
the background ADSL and P2P traffic contributing to 
a slight overestimation of false alarms (see Levy-Lcduc 
and Roiieff, 2009). Second, the random decentralization 
approach used in Section 3 docs not necessarily corre- 
spond to a realistic network topology. In this section, 
we thus consider synthetic high-dimensional data corre- 
sponding to an idealized minute of traffic containing a 



Fig. 6 Generated graph: nodes are displayed with circled 
numbers and monitors with colored boxes. 

Using the shortest path Dijkstra (1959) algorithm, 
the routes between each node of the network are com- 
puted, that is the lists of edges of the graph that form 
the path between the nodes. These routes are used to 
determine which monitors will see the traffic between 
two hosts. Note that in our procedure, we have deliber- 
ately not considered network links capacity, that would 
otherwise imply some more sophisticated dynamic rout- 
ing algorithms, which is beyond the scope of this con- 
tribution. 

The traffic injected in this network is generated as 
follows. For a given Source-Destination IP address pair 
we follow Lcvy-Leduc and Roueff (2009) and model 
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the SYN packet traffic using a Poisson point process 
with a given intensity 9ij, expressed as the number 
of SYN packets received by j per sub-intervai of the 
observation window. In Network applications, differ- 
ent Source-Destinations pairs exchange a very different 
amount of traffic. Hence we shall use different inten- 
sities for each pair of hosts. To take into account this 
diversity, we propose using the realizations of a Pareto 
distribution for the parameters of the different intensi- 
ties so that a lot of machines receive a small number of 
SYN packets while a few receive a lot. Note that Nucci 
ct al (2005) similarly use a heavy-tailed distribution to 
generate network traffic. 

We first randomly generate a sequence {lJ.k)i<k<N 
of intensities with the Pareto distribution having the 
following density: 7a/(l + 7x)'+", when x > 0, with a = 
2.5 and 7== 0.72, which roughly corresponds to what we 
observed in the (centralized) real traffic traces used in 
Section 4. The parameters jj^k are assumed to be sorted 
as follows: ^1 > • • • > Hn- 

Here, {Xij{t))i<r<p correspond to the number of SYN 
packets sent by / and received by j in each of the P sub- 
intervals of the observation window, where i^j are in 
{!,... ,D}. Among these time series, A'^, of them cor- 
respond to the traffic received by the attacked destina- 
tion IP address 70, which is assigned to a fixed location, 
in node 7, at the "edge" of the network (see Figure 6). 
This traffic, which is sent by source IP addresses / be- 
longing to a randomly chosen subset J^a of {1, • • • ,£>}, 
is generated as follows: 

V/ e J^„, (^,jo(0)i<r<T ~ Poisson(0ijo) , 
and 

V/ G iXijg{t))T:<t<p Poisson(Tj0,jJ , 

where 7] is a positive number which modulates the change 
intensity, T is the change-point instant and (0(jo)ig^„ 
are chosen in (Ali:)40A'„<*:<4iAr„- (ftjo)!G.A, are thus chosen 
around 0.6 (0.4-quantile of the Pareto distribution with 
parameters a and 7, whose mean is about 0.93). Hence, 
the attack to be detected consists of a multiplicative 
increase in intensity of Na attacker sources, whose in- 
tensity is otherwise in the bulk of the distribution of 
the intensity (close to the 0.4-quantile). The remaining 
background traffic is generated as: 

V/ e {1, . . . j ^ jo, iXij{t)),<,<p Poisson(0,j) , 

where idi.j)i£{i^...^D},j^j(, chosen randomly in the re- 
maining values ofHk-. (M-t)i-^[407v„;4iiv„]- 

In the experiments below, = 10100, Na = 100, P ~ 
60, T = 30 and we consider different values for the pa- 
rameter T] (1.2,1.5) in order to modulate the detection 



difficulty. With such a choice of parameters, we simulate 
DDoS-type attack against 70: the attack is generated by 
a large number Na of source hosts coming from no par- 
ticular place in the network. Moreover, since this traffic 
is shared by the different monitors, this attack can be 
locally (within a monitor) very difficult to distinguish 
from the background traffic as can be seen in Figure 7. 

It displays for each monitor, when 7] = 1.5, an exam- 
ple of the time series formed by the number of packets 
received by the first ("x") and 10th ("•") most solicited 
destination IP address at each sub-interval as well as 
the time series of the attacked address 70 (">")■ The 
monitors that have not detected any traffic directed to 
70 were omitted. In (d), (e) and (i), 70 is detected by 
the monitor, but the number of packets is never high 
enough to be selected by the record filtering step and 
to appear in {,5m(0i ? = 1, • • • ,60}. Hence in these mon- 
itors, no change detection test is performed for 70. In 
the other six figures, all steps of the TopRank algorithm 
arc carried out since the number of packets sent to 70 
is high enough. A special case is shown in (a), which 
displays the time series in the monitor located on the 
edge between nodes 10 and 7, see Figure 6, which is 
the link where all the traffic directed to the attacked IP 
address 70 appears. 

4.2 Performance of the methods 

The two methods described in Section 2 are compared 
by computing their false alarm and detection rates when 
tested on 1000 Monte-Carlo replications of the synthetic 
data described in Section 4.1. The left plot of Figure 8 
displays the corresponding ROC curves for different val- 
ues of 77 (1.2 and 1.5); the solid and dashed lines show 
the results of the DTopRank and BTopRank algorithms, 
respectively. 

For larger values of 77 (1.5), both methods perform 
very well, with a few missed attacks for a low false alarm 
rate. DTopRank yields slightly better results than the 
other method. For 7j = 1.2 for which attacks are more 
difficult to detect, the detection performance is natu- 
rally lower for both algorithms; the toll is however heav- 
ier on BTopRank than on DTopRank. 

We observed that the detection performance was 
improved for Monte Carlo runs in which a monitor is 
assigned to the 7-10 edge of Figure 6. Indeed in this 
case, at least a monitor has access to all the traffic sent 
to the attacked IP address sitting at node 70 = 7. The 
right plot of Figure 8 corresponds to the case where 
this configuration is avoided in the Monte Carlo simu- 
lation, which gives some idea of the significance of the 
phenomenon. For a given monitor topology, the detec- 
tion performance is thus better for target addresses lo- 
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Fig. 7 Time series formed in 9 monitors by the number of packets received by the first ("x") and 10th ("•") most solicited 
destination IP address at each sub-interval, as well as the time series of the attacked address jo (">"). 
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Fig. 8 Left: ROC curves for DTopRank (solid lines) and BTopRank (dashed lines), for rj = 1.2 ("•"), 1.5 ("A"). Right: similar 
simulation when forbidding the 10-7 edge from the monitors. 



cated at the edge of the network, behind a monitor. In 
the opposite case however, the detection performance is 
stiU appreciable due to the aggregation, at the collector 
level, of the information sent by the monitors. 



5 Conclusion 

In this paper, we proposed a distributed method for 
detecting and localizing DDoS attacks in Internet traf- 
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fic. With this approach, a local processing based on a 
record filtering technique followed by a nonparametric 
rank test is performed within the local monitors. Only 
the censored time series of IP addresses correspond- 
ing to the smallest p- values are transmitted and aggre- 
gated in the collector. The processing carried out both 
in the monitors and in the central collector is also suf- 
ficiently simple to make possible real-time implementa- 
tion. Compared with the use of purely local detectors, 
the proposed algorithm has been shown to reveal at- 
tacks which are not locally detectable. Indeed, the sta- 
tistical performance of the proposed approach is close to 
that achieved by the fully centralized detector but with 
a greatly reduced communication overhead. An addi- 
tional interesting feature of the proposed aggregation 
and detection mechanism is the fact that it operates 
similarly at the monitor and collector level. Hence, the 
test could also applied hierarchically, with tree struc- 
tured monitors, so as to produce decisions correspond- 
ing to groups of monitors of different granularity in the 
network. 

A Appendix 



where Gp{-) = 1 — Gp(-). Then, using the Glivenko-CanteUi 
Theorem (\-an der Vaart, 1998, Theorem 19.1), we get, as 
P tends to infinity, that 

j=l j=\ 7=1 

.;■=! ;=i j=i 

By the law of large numbers and our assumption in (2), we 
obtain, as P tends to infinity, that 

^ £ (?^^)' ^ n{nx-)-G{x)}-] > . (8) 

Using (7), P^^\Ui\ <2, i= 1,...,P, and thus 
\Y:\ = l^'-l = 1 ^''l^-'l 

1 2 

< I ,i=i,...,P. 

^ sJp-'LUiP-''^j)' 

Using (8), then shows that Y; satisfy the third condition of 
(5), which completes the proof. 



A.l Proof of Theorem 1 

The following proof is based on (Billingsley, 1968, Theorem 
24.2) , which asserts that if ^i,...,^„ are exchangeable random 
variables (each permutation of the set of variables has the 
same joint distribution) and satisfy, as n oo, 

t^i^O, max 1^,1 J^O, (5) 

then I?, ,0 < f < 1} {B{t) , < / < 1}, as h ^ oo, where 

B is a Brownian bridge. 

We apply this theorem to the random variables Yi,. . .,Yp, 
defined in (1), which are exchangeable since {X_{i),X{i))i<i<p 
are i.i.d random vectors. Let us now check the three condi- 
tions in (5). By the anti-symmetry of the kernel h, 

£f/,' =1 7) =0, 
,=1 ,=1 j=i 

which gives the first condition of the theorem. The second 
one follows from the definition of Yi : 

,=1 2-1=1 '^j i=l 

To check the third condition, denote by Fp (resp. Gp) the 
empirical c.d.f. of X(l), . . . ,X{P) (resp. X{1), . . . ,X{P)): 

p _ p 

ppit) = p^' L 1(^(0 ^ ^nd Gp{t) = p-' i®o ^ ■ (6) 

1=1 i=l 

Note that 

\ \ P _ _ 

= - i; i{x{i) > x{j)) - i{xii) < xU)) 

^ ^i=i 

=Fp{x{iy)-{^-Gp{xm=Fpm)-)-Gpm) , (7) 
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